L?szl? Dud?s, MTA SZTAKI, ldudas@info.ilab.sztaki.hu
Zsolt Fekete, MTA SZTAKI, zsfekete@info.ilab.sztaki.hu
Julianna G?b?l?s-Szab?, MTA SZTAKI, gszj@info.ilab.sztaki.hu, PRIMARY
Andr?s Radnai, MTA SZTAKI, aradnai@info.ilab.sztaki.hu
?gnes Sal?nki, MTA SZTAKI, salankia@info.ilab.sztaki.hu
Adrienn Szab?, MTA SZTAKI, aszabo@info.ilab.sztaki.hu
G?bor Szucs, MTA SZTAKI, szgabbor@info.ilab.sztaki.hu
Student Team: NO
Owlap Anlytics Pro was developed by our team especially for VAST Challenge 2012.
Prezi, http://prezi.com/, used by permission of the Prezi Team.
Prezi is a cloud-based (SaaS) presentation software and storytelling tool for exploring and sharing ideas on a virtual canvas. Prezi is distinguished by its zooming user interface (ZUI), which enables users to zoom in and out of their presentation media. Prezi allows users to display and navigate through information within a 2.5D space on the Z-axis. (from Wikipedia)
Video:
Answers to Mini-Challenge 1 Questions:
MC 1.1 Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe?
We
observed two types of disorder at 14:00 on 2nd of February:
1.
in some regions high ratio of machines don't work:
Altogether
cc. 9 % of the machines are out of order (79761 from 888977) , which
is supposed to be caused by two phenomena: in some western regions
the workday had not started yet, on the other hand there are regions
with suspicious characteristics.
In
Datacenter-5 (in the headquarter region) a high number of servers (49240
from 51325) are out of order. Unexpectedly, all the 5 workstations
that belong to this facility are functioning well at this time.
Illustration
1: 02.02.2012, 14:00, dc-5: Number of servers and workstations,
colored by policy status
In
Region 25 some of the facilities seem to be unavailable. In the
following branches none of the computers provide us any data: branch
2, 3, 4, 8, 9, 23, 27, 30, 33, 39, 44, 47. The rest of the facilities
(headquarters and all other branches under branch-50) are online and
they are working according to the global trend.
Illustration
2: 02.02.2012, 14:00, region-25: geo-spatially distribution of
unavailable and healthy machines
2.
relatively high amount of computers are not healthy.
Approximately
cc. 16.2% of all computers (144 311 machines) show moderate policy
deviance and 3034 show serious deviance.
Especially worrying, in
region 5 and 10 none of the machines are healthy. Note that
Datacenter-5 is located on the field area of Region-5, although this
two areas have distinct problems.
Illustration
3: 02.02.2012, 14:00: average policy status in regions
MC 1.2 Use your visualization tools to look at how the network?s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?
1. Blackout
in Headquarter's Datacenter-5.
At the beginning of the period all
the servers and 2 workstations are offline. Only 3 workstations
provide us any data from this area, later the 2 remaining
workstations are turned on.
Illustration
4: Policy status distribution in time (Axis-X) in dc-5: high ratio
of unavailable machines at the beginning
At 4:45 AM (BTM
10:45) 240 servers (of various types) turn on for 60 minutes.
However, in this period no suspicious activity or traffic can be
observed.
Later all machines start to work again.
Illustration
5: dc-5: 240 servers are turned on during the black-out (green
pattern surrounded with blue bars)
Illustration
6: Region25: blackout during the first day. Availability is
characterized by facilities, this can be seen on the map (blue means
NA, green means healthy)
Illustration
7: Unhealthy regions: region-5 and region-10 do not contain any
healthy machines, even at the period start
4. Unusual
network traffic in Region-10, generated by tellers
Normally
tellers should have less than 5 connections in average beyond
business hours, however in region-10 we observed that tellers
provide unexpectedly high connection number during the nights.
In
the first night this phenomenon can be observed only in a few
facilities, while in the second night all facilities show this
anomalous behaviour.
Illustration
8: Average connection number of teller workstations in region-10:
unexpected network traffic during the night
Illustration
9: Region10: average connection examined by facilities. Two types of
behaviours can be observed, e.g. branch-43 and branch-44
The exact details of the night 2nd of February are contained in the following table:
Branch |
Starting time (local time) |
End (local time) |
Average connection number |
Branch-6 |
02:15:00 AM |
05:00:00 AM |
10 -15 |
Branch-24 |
03:15:00 AM |
05:00:00 AM |
30 - 40 |
Branch-43 |
03:15:00 AM |
05:00:00 AM |
40 - 50 |
Branch-57 |
03:15:00 AM |
05:00:00 AM |
15- 20 |
Branch-76 |
04:15:00 AM |
05:00:00 AM |
10 - 15 |
Branch-89 |
04:15:00 AM |
05:00:00 AM |
15 - 25 |
Branch-106 |
04:15:00 AM |
05:00:00 AM |
30 ? 35 |
In
the second night all 250 facilities generate increased traffic. This
event begins at 2:15:00 AM (local time) and it ends at 5:00 AM. The
average connection number in this time interval is about 25.
5. Infection characteristics
The trend shows that more and more computers become ?sick? which means that they are not in the healthy policy state anymore. While servers and atms get infected during the 2-day period continuously and uniformly, workstations have two hops in their charts, which is caused by the fact, that people arrive to work in the morning and they turn their computers on.
Illustration 10: Infection trend in the whole data set, separated by machine classes.
However there is another suspicious fact that we observed: in the second morning several computers show deviance after turning on, although they were not infected (or at least less infected) on the previous day, before shutting down. This anomaly can be explained by the fact, that after turning on, most computers connect to a server. Since servers are also infected in large amount, the probability of the virus infection by propagation is also increased.
Illustration 11: Policy status transitions according to local time: in the second morning the newly turned on workstations show higher deviation rate than they did on the previous day right before shutting down
A Prezi presentation about the anomalies